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1  SUMMARY 

We  are  exploring  computing  technologies  for  simulating  quantum-mechanical  phenomena  and 
developing  software  tools  to  aid  in  the  development  of  quantum  computers.  The  project  seeks 
new  algorithms  and  software  tools  to  simulate  certain  classes  of  quantum  circuits  with  improved 
efficiency.  Such  circuits  arise  when  enriching  arbitrary  quantum  circuits  with  quantum  error- 
correction  codes,  and  may  also  be  generated  by  restructuring  more  generic  quantum  circuits  to 
improve  the  efficiency  of  quantum  simulation.  The  project  is  structured  around  extensions  of  the 
so-called  stabilizer  formalism,  making  the  resulting  computation  more  efficient  and  amenable  to 
parallel  processing. 

2  INTRODUCTION 

The  work  reported  includes  algorithmic  techniques  and  methodologies  to  simulate  quantum 
circuits  with  improved  efficiency.  Our  key  objective  was  to  develop  and  evaluate  new  principles, 
algorithms  and  software  for  high-perfonnance  simulation  of  quantum  circuits  -  a  widely 
accepted  computational  model  for  quantum  computation  and  communication 

The  construction  of  computer  algorithms  and  software  models  that  simulate  physical  systems 
plays  a  fundamental  role  in  all  branches  of  science  and  engineering,  and  plays  a  special  role  in 
attempts  to  achieve  competitive  advantage  over  non-quantum  techniques.  In  particular,  it  was 
observed  in  the  1980s  that  the  important  task  of  simulating  quantum-mechanical  processes  on  a 
standard  computer  requires  an  extraordinary  amount  of  computer  memory  and  runtime.  Such 
observations  gave  rise  to  the  notion  of  quantum  computing,  where  quantum  mechanics  itself  is 
used  to  simulate  naturally-occurring  quantum  properties  and  phenomena.  The  key  insight  is  to 
replace  the  familiar  0  and  1  bits  of  conventional  computing  with  information  units  called  qubits 
(quantum  bits)  that  capture  quantum  states  of  elementary  particles  or  atomic  nuclei.  By  operating 
on  qubits,  a  quantum  computer  can,  in  principle,  process  exponentially  more  data  than  a  classical 
computer  in  a  similar  number  of  steps.  Existing  quantum  algorithms  exponentially  outperfonn 
best  known  techniques  for  key  tasks  in  code-breaking  [1]  [2].  In  addition  to  quantum 
computation,  quantum  communication  exhibits  unique  properties,  such  as  automatic  detection  of 
attempts  to  eavesdrop.  High-speed  quantum  communication  systems  have  been  built  by  the 
National  Institute  of  Standards  and  Technology,  several  DARPA  contractors  (notably,  BBN 
Technologies)  and  start-ups  in  the  US  and  Europe.  Some  of  these  systems  are  currently  available 
commercially,  and  others  are  operated  24x7  for  research  purposes  and  as  testbeds  for  network 
protocol  development. 

The  most  serious  fundamental  obstacle  to  practical  quantum  infonnation  processing  known 
today  is  the  inherent  instability  of  qubits.  This  obstacle  is  traditionally  addressed  by  quantum 
error-correction  techniques  [3]  [4],  which  are  generally  available  but  require  significant  overhead 
and  require  adaptation  to  individual  quantum  circuits.  In  order  to  scale  quantum  infonnation 
processing,  including  computation  and  communication,  to  large  and  complex  systems,  quantum 
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device  operation  is  compactly  captured  by  the  abstract  fonnalism  of  quantum  circuits  [5].  These 
circuits  consist  of  interconnected  quantum  gates  that  act  on  qubits.  They  can  be  composed  in  a 
hierarchical  manner,  using  design  techniques  similar  to  those  used  in  digital  logic  design. 
However,  as  quantum  circuits  offer  a  much  broader  range  of  infonnation-processing  possibilities, 
their  design  and  evaluation  entail  a  dramatic  increase  of  complexity,  requiring  new  levels  of 
sophistication  in  design  algorithms  and  tools  [6].  A  particularly  important  class  of  design  tools 
perfonns  simulation  of  quantum  circuits  on  conventional  workstations,  i.e.,  these  tools  produce 
representative  outputs  of  ideal  quantum  circuits  on  particular  inputs,  but  without  requiring 
quantum  hardware. 

Quantum  simulation  tools  typically  consist  of  a  front-end  and  a  back-end  [7].  The  front-end 
facilitates  the  development  of  quantum  software  and  the  back-end  acts  as  a  temporary 
replacement  of  (hardware)  quantum  processing  units  to  run  such  software.  Once  quantum 
hardware  is  available,  it  can  be  used  in  conjunction  with  pre-existing  front-end  to  run  the 
accumulated  software  with  increased  efficiency.  In  some  cases,  quantum  simulation  can 
demonstrate  that  quantum  software  does  not  bring  competitive  advantage  to  quantum  hardware 
over  conventional  CPUs.  This  project  develops  new  mathematical  and  algorithmic  concepts  for 
efficient  simulation  of  quantum  circuits.  These  concepts  are  being  implemented  in  software  and 
will  help  evaluating  architectures  for  error-corrected  quantum  communication  and  computation. 

We  have  also  begun  to  investigate  a  highly  unconventional  tool  for  simulating  quantum  circuits 
that  employs  stochastic  computing  (SC)  [8]  [9].  SC  processes  infonnation  in  the  fonn  of  random 
bit-streams  that  are  interpreted  as  probabilities  and  are  implemented  by  relatively  simple  logical 
structures.  Preliminary  results  demonstrate  that  the  technique  is  feasible,  and  that  useful  trade¬ 
offs  between  model  size,  run-time  and  accuracy  can  be  achieved. 


3  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

This  effort  builds  upon  the  software  infrastructure  we  developed  as  part  of  previous  AFRL- 
funded  research.  Unlike,  other  quantum-circuit  simulators,  stabilizer-based  techniques  can 
process  the  so-called  stabilizer  gates  (Figure  1)  very  efficiently. 
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Figure  1.  Controlled-CNOT,  Hadamard  and  Phase  gates. 

The  stabilizer  formalism  is  a  simulation  method  involving  the  Heisenberg  model  [8]  often  used 
by  physicists  to  describe  atomic-scale  phenomena.  In  this  model,  one  keeps  track  of  the 
symmetries  of  an  object  rather  than  represent  the  object  explicitly.  In  the  context  of  quantum- 
circuit  simulation,  this  model  represents  quantum  states  by  their  symmetries,  rather  than 
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complex-valued  vectors  and  amplitudes.  The  symmetries  are  Pauli  operators  (tensor  products  of 
the  one-qubit  Pauli  matrices  shown  in  Figure  2  along  with  the  identity)  for  which  these  states  are 
1 -eigenvectors,  i.e.,  the  Pauli  operators  that  stabilize  such  states.  Algebraically,  these  symmetries 
form  a  group  structure,  which  can  be  specified  compactly  by  group  generators  [9].  Therefore, 
the  formalism  represents  an  n-qubit  pure  quantum  state  \ip)  by  its  stabilizer  group  S(\ip))  -  the 
group  of  n-qubit  Pauli  operators  that  stabilizes \xp). 

0 )  z  =  Q  -°i) 

Figure  2.  The  one-qubit  Pauli  operators. 

Theorem  1.  For  an  n-qubit  pure  state  \xp)  and  k  <  n,  S(|i/>))  =  Ifk  —  n,  \xp)  is  specified 
uniquely  by  S(\xp))  and  is  called  a  stabilizer  state. 

Proof.  See  [10]  for  details. 

S(|i/>))  is  specified  by  only  log2  2 71  —  n  irredundant  stabilizer  generators.  Therefore,  an  arbitrary 
n-qubit  stabilizer  state  can  be  represented  by  a  stabilizer  matrix  M  whose  rows  represent  a  set  of 
generators  ■■■ ,  Rn  for  S(\lP))-  (Hence  we  use  the  terms  generator  set  and  stabilizer  matrix 

interchangeably.)  Since  each  RL  is  a  string  of  n  Pauli  literals  (X,  Y,  Z  or  I),  the  size  of  the  matrix 
is  n  x  n.  Each  RL  contains  a  +1  leading  phase  which  is  stored  separately  using  a  binary  vector  of 
size  n.  Thus,  the  storage  cost  for  JVf  is  0(n2),  which  is  an  exponential  improvement  over  the 
0(2n)  cost  often  encountered  in  vector-based  representations.  Tables  1  and  2  list  all  one  and 
two-qubit  stabilizer  states  along  with  their  generator  sets. 

Table  1.  One-qubit  stabilizer  states  and  their  Pauli-matrix  stabilizer. 

Shorthand  notation  lists  only  the  normalized  amplitudes  of  the  basis  states. 
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Table  2.  Sixty  two-qubit  stabilizer  states  and  their  corresponding  Pauli  generators. 
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Note  that  in  Table  2,  the  shorthand  notation  represents  a  stabilizer  state  as  alr  a2,  a3a4  where 
are  the  normalized  amplitudes  of  the  basis  states.  The  basis  states  are  emphasized  in  bold.  The 
first  column  lists  states  whose  generators  do  not  include  an  upfront  minus  sign,  and  other 
columns  introduce  the  signs.  A  sign  change  creates  an  orthogonal  vector.  Therefore,  each  row  of 
the  table  gives  an  orthogonal  basis.  The  cells  in  dark  grey  indicate  stabilizer  states  with  four  non¬ 
zero  basis  amplitudes,  i.e.,  at  A  0  V  i.  The  Z  column  indicates  the  angle  between  that  state  and 
1 00),  which  has  twelve  nearest-neighbor  states  (light  gray)  and  15  orthogonal  states  (1). 


Stabilizer  circuits,  which  consist  exclusively  of  stabilizer  and  measurement  gates,  can  be 
simulated  in  polynomial  time  and  space  on  conventional  computers  because  such  gates  conjugate 
the  Pauli  group  onto  itself.  Thus,  to  simulate  a  stabilizer  gate,  one  simply  updates  the  stabilizer 
matrix  using  a  finite  set  of  rules  for  mapping  Pauli  literals  to  Pauli  literals  [3]  [11]  [12]. 
Stabilizer  simulation  is  limited  by  the  following  constrains:  (i)  the  initial  state  of  the  quantum 
system  must  be  a  computational-basis  or  stabilizer  state  and  (zz)  the  quantum  gates  that  are 
applied  to  the  system  must  be  stabilizer  gates. 

Our  approach  was  to  generalize  stabilizer  simulation  by  relaxing  the  above  constraints.  This 
included  the  design  of  a  novel  data  structure,  called  a  stabilizer  frame,  which  can  be  used  to 
represent  an  arbitrary  quantum  state  as  a  superposition  of  stabilizer  states.  It  is  important  to  note 
that  given  two  physical  quantum  states,  there  is  no  quantum-mechanical  operation  to  combine 
them  in  a  linear  combination  -  such  operations  are  specific  to  the  simulation  context.  Generic 
simulation  algorithms  typically  can  process  arbitrary  linear  combinations,  but  algorithms  that  are 
limited  to  certain  quantum  states  usually  have  trouble  with  linear  combinations. 

Stabilizer  frames  facilitate  simulation  of  almost-stabilizer  circuits,  i.e.,  circuits  that  contain  a 
small  number  of  non-stabilizer  gates.  To  understand  the  advantages  and  limitations  of  our 
approach,  we  conducted  a  comprehensive  analysis  of  the  geometry  of  stabilizer  states  (“On  the 
geometry  of  stabilizer  states,”  QIC  2014)  [10].  This  research  allowed  us  to  identify  the  following 
advantages  when  using  stabilizer-state  superpositions  to  simulate  quantum  circuits  are: 
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•  Large  stabilizer  subcircuits,  such  as  those  encountered  in  quantum  error  correcting 
circuits  and  fault-tolerant  architectures,  are  simulated  with  high  efficiency. 

•  Superpositions  can  be  restructured  and  compressed  on  the  fly  during  simulation  to  reduce 
resource  requirements. 

•  Operations  perfonned  on  such  superpositions  can  be  computed  in  parallel  and  lend 
themselves  to  distributed  or  asynchronous  processing.  The  superpositions  can  be 
partitioned  between  processors,  and  gates  can  be  applied  in  parallel  to  these  partitions. 

Therefore,  the  stabilizer-based  techniques,  algorithms  and  tools  we  developed  can  handle 
arbitrary  gates  beyond  the  stabilizer  formalism,  but  incur  certain  penalties  for  non-stabilizer 
gates.  Furthermore,  we  designed  additional  techniques  to  moderate  these  penalties  through 
algorithmic  innovation,  optimized  software  implementations  and  parallel  processing  (“Quipu: 
high-perfonnance  simulation  of  quantum  circuits  via  stabilizer  frames,”  ICCD  2013). 

4  RESULTS  AND  DISCUSSION 

Three  main  accomplishments  of  this  project  deserve  special  identification:  (z)  our  software 
improvements  to  QuIDDPro  (developed  under  prior  support  from  DARPA  and  AFRL),  (z'z)  our 
results  on  the  geometry  of  stabilizer  sates,  and  (zzz)  our  design  of  new  software  tools  for 
simulation  of  quantum  circuits  and  fault-tolerant  quantum  architectures. 

4.1  Development  and  maintenance  of  software  tools 

The  goal  pursued  in  this  part  of  the  overall  effort  was  to  design  a  compact  data  structure  for 
representing  and  transfonning  quantum  circuits  within  QuIDDPro  in  order  to  improve  overall 
runtime  and  memory  perfonnance  when  simulating  quantum  circuits.  At  the  gate-level,  our  data 
structure  is  designed  to  use  various  storage  schemes  that  range  from  highly  compact  bit- 
representations  to  explicit  matrices  depending  on  the  quantum  gate  that  is  to  be  stored  in 
memory.  It  also  provides  I/O  functions  and  can  be  used  as  an  interface  mechanism  between 
external  synthesis  and  simulation  tools. 

Data  decomposition.  Our  data  structures  specify  the  infonnation  required  to  represent  and 
analyze  quantum  circuits.  This  data  structure  also  provides  access  to  low-level  constructs  that 
represent  quantum  gates  in  the  most  compact  way  possible  given  the  specifications  of  each  gate. 
Quantum  circuits  describe  a  particular  computation  using  a  sequence  of  quantum  gates  that  act 
on  a  set  of  qubits.  Hence,  we  employ  a  data  structure  consisting  of  an  array  of  quantuum  gates. 
Each  gate  is  represented  using  one  of  three  data  structures — compact,  elaborate  or  explicit.  Note 
that  these  data  structures  are  not  disjoint  and  some  build  upon  simpler  data  structures,  ensuring 
efficiency  in  common  cases,  while  also  representing  less  frequently  occurring  gates. 

•  Case  1 :  the  gate  can  be  specified  using  a  64-bitfield  representation  that  contains  the 
necessary  qubit  wire  infonnation  only,  i.e.,  the  operator  matrix  is  not  stored.  We  call  this 
the  compactquantumgate  data  structure.  Most  named  quantum  gates  (e.g.  NOT,  CNOT, 
PHASE,  etc.)  that  act  on  up  to  three-qubits  will  fall  into  this  category. 
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•  Case  2:  the  gate  representation  requires  more  than  64-bits.  We  use  a  larger 
elaboratequantumgate  data  structure  that  stores  all  qubit  wire  information  for  named 
gates  of  any  size  by  reusing  the  64-bit  scheme  described  in  Case  1 .  The  matrix  operator 
itself  is  not  stored.  Most  large  named  gates  that  act  on  more  than  three  qubits  will  fall 
under  this  category. 

•  Case  3:  the  gate  is  represented  explicitly  using  a  matrix  of  complex  numbers  and  the 
qubit  wire  information  is  stored  using  the  schemes  from  either  Case  1  or  Case  2 
depending  on  the  amount  of  wire  infonnation  that  needs  to  be  stored.  We  call  this  the 
explicitquantumgate  data  structure.  Unnamed  and  user-defined  gates  fall  under  this 
category. 

We  implemented  this  data  structure  as  the  interface  mechanism  within  QuIDDPro  to  facilitate 
synthesizing  and  simulating  quantum  circuits.  Specifically,  we  used  this  data  structure  to 
implement  a  native  quantum  compiler  (q-compiler)  in  QuIDDPro.  Analogous  to  the  initial  phase 
of  classical  compilers,  the  q-compiler  creates  the  intermediate  representation  that  is  to  be 
transfonned  in  later  phases  of  the  design  flow.  The  q-compiler  first  parses  the  quantum  circuit 
specification  input  file  and  decides  which  compact  representation  to  use  on  a  per-gate  basis.  Our 
data  structure  is  then  used  to  instantiate  the  intermediate  representation  and  to  keep  track  of  the 
subsequent  circuit  transfonnations.  Since  our  proposed  data  structure  is  compact  by  design,  it  is 
possible  to  load  the  entire  quantum  circuit  representation  in  memory  at  once  thereby  removing 
several  potential  sources  of  inefficiency.  The  current  version  of  QuIDDPro  (3.8)  features  a 
compile  batch-mode  option,  which  acts  as  a  q-compiler  and  outputs  one  intermediate- 
representation  file  for  each  state -vector  manipulated  in  the  high-level  quantum  program  (density 
matrices  are  not  currently  supported).  Figure  3  shows  the  intennediate  representation  for  the 
quantum  Fourier  transform  circuit  that  is  instantiated  from  the  successful  compilation  of  the 
high-level  source  file.  The  circuit  representation  can  be  used  by  external  design  tools  as  required, 
e.g.  for  building  a  graphical  representation  for  the  circuit. 

Simulation  of  quantum  circuits.  After  generating  quantum  circuits  from  a  high-level 
QuIDDPro  program,  it  may  be  useful  to  simulate  such  a  circuit.  Given  the  multitude  of  available 
algorithms  for  simulation,  one  should  first  analyze  the  circuit's  structure  to  detennine  the  most 
appropriate  simulation  techniques.  For  example,  in  the  case  of  quantum  stabilizer  circuits,  the 
software  infrastructure  can  employ  efficient  algorithms  from  [11].  In  fact,  our  data  structure  is 
already  used  by  two  different  simulator  back-ends — one  specific  to  stabilizer  circuits  and  one 
entirely  generic,  which  we  call  QuIDDPro-Lite  (QPLite).  Note  that  our  work  takes  on  a  more 
holistic  approach  to  quantum  circuit  simulation  by  being  sensitive  to  the  type  of  circuits  in  each 
case.  In  the  worst  case,  the  exponential  cost  associated  with  simulation  of  arbitrary  quantum 
gates  will  limit  the  range  of  problems  that  the  infrastructure  can  support.  However,  the 
infrastructure  is  flexible  enough  such  that  it  can  be  used  to  simulate  most  practical  quantum 
circuits  and  even  allows  simulation  of  hybrid  computing  platfonns  that  incorporate  both  quantum 
and  classical  constructs.  Our  experiments  showed  that  pre-compiling  and  simulating  a  quantum 
circuit  via  QPLite  is  up  to  4X  faster  than  using  the  native  QuIDDPro  simulator. 
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4.2  On  the  geometry  of  stabilizer  states 

The  goal  pursued  in  this  part  of  the  overall  effort  was  to  provide  a  theoretical  analysis  of  the 
geometric  properties  of  stabilizer  states,  and  design  new  efficient  algorithms  for  computing 
fundamental  geometric  operations  such  as  inner  products  and  wedge  products.  The  main 
contributions  of  this  work  are: 

•  We  quantified  the  distribution  of  angles  between  pairs  of  stabilizer  states  and 
characterized  nearest-neighbor  stabilizer  states. 

•  We  studied  linearly-dependent  sets  of  stabilizer  states  and  showed  that  every  linearly- 
dependent  triplet  of  such  states  that  are  non-parallel  to  each  other  includes  two  pairs  of 
nearest  neighbors  and  one  pair  of  orthogonal  states.  We  illustrate  such  triplets  in  Figure  4. 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

7 


High-level  quantum  program  written  in  QuIDDpro 
for  calculating  the  quantum  Fourier  transform  (QFT) 


#  create  the  state  -vector  with  n  =  3  qttaits 
n  =  3; 

ckt_size  =  n; 
state  -  <±>("0")  ; 
count  —  1; 

virile  (count  <  dctjsize) 
state  =  state  00  db("0")  ; 
oount-H-; 
end 

#  Tpply  Hadanard  and  control  Lad-R_k  gates  to  each  cjioit. 
mrrcpbit  =  1; 

virile  (oirr_qubit  <-  n) 

#  T^rply  a  Hadamard  gate. 

state  =  cu_gate(H(l) ,  "x"-tcurr_qtfaLt,  ckt_size)  *state  ; 
targ  =  curr_qiat; 
ctrl  =  n; 

viiile  ( (Ctrl  >  targ)  &&  (targ  >  0) ) 

#  apply  control  leri-R_k  gates  to  each  gfoit. 
r_k  =  [1  0;  0  exp (-2*pi*i/ (2A (Ctrl  -  targ  +  1) ) ) ] ; 
state  =  cu_gate(r_k,  "c"+ctrl+"x"+targ,  ckt_siae)*state; 
Ctrl — ; 
end 

curr_<jJoit++  ; 
end 

#  iVply  swap  gates  to  reverse  the  order  of  all  ghats. 
curr_swap  =  0; 

virile  (n/2  -  curr  swsp  >=  1) 

state  =  swap(curr_swap  +  1,  n  -  girr_swap,  state)  ; 
currjswapH-; 
end 


q-compile 


Graphical  representation  of  the  QFT  circuit 
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Intermediate  representation  of  the  QFT  quantum  circuit 
compiled  from  the  high-level  program 
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Figure  3.  Sample  output  of  the  intermediate  circuit  representation  compiled  from  high-level  program  source  code. 
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•  We  also  described  an  orthogonalization  procedure  for  stabilizer  states  that  exploits  this 
rather  uniform  nearest-neighbor  structure. 

•  We  showed  that:  (z)  for  any  n-qubit  stabilizer  state  \xp),  there  are  at  least  5(2n  —  1)  states 
\(p)  such  that  \ip  A  (p)  is  a  stabilizer  bivector,  and  (z'z)  the  nonn  of  the  wedge  product 
between  any  two  stabilizer  states  isVl  —  2~k,  where  0  <  k  <  n. 

•  We  explored  the  approximation  of  non-stabilizer  states  by  single  stabilizer  states  and 
short  superpositions  of  stabilizer  states. 

•  We  exploited  our  analysis  of  the  geometry  of  stabilizer  states  to  design  algorithms  for  the 
following  fundamental  tasks:  (z)  synthesizing  new  canonical  stabilizer  circuits  that  reduce 
the  number  of  gates  required  for  QECC  encoding  procedures,  (z'z)  computing  the  inner 
product  between  stabilizer  states,  and  (z'z'z)  computing  stabilizer  bivectors. 


Figure  4.  The  angle  between  any  stabilizer  state  and  its  nearest  neighbors  is  45° 

and  the  distance  is  V 2  —  a/2. 


In  Figure  4,  the  angle  between  any  stabilizer  state  and  its  nearest  neighbors  is  45°  and  the 

distance  isV 2  —  y/2.  Here,  |sx)  is  a  nearest  neighbor  of  both  1 00)  and  1 10).  Similarly,  |s2)  is  a 
nearest  neighbor  of  1 00)  and|01).  The  angle  between  these  two  nearest  neighbors  of  1 00)  is  60°. 
Consider  the  linearly-dependent  triplets  (|00),  1 10),  l^)}  and  {| 00),  1 01),  |s2)}.  Each  set  contains 
two  pairs  of  nearest  neighbors  and  one  pair  of  orthogonal  states. 

Geometric  properties  of  stabilizer  states.  We  define  nearest  neighbor  of  an  n-qubit  stabilizer 
state  | ip)  as  a  state  \cp)  such  that  \(xp\cp)\  —  I/a/2,  the  largest  possible  valued  1.  We  prove  that 
every  stabilizer  state  has  exactly  4(2n  —  1)  nearest  neighbors  and  generalize  this  result  to  the 
case  of  A: -neighbors,  where  0  <  k  <  n.  We  used  these  results  to  quantify  the  distribution  of 
angles  between  any  one  stabilizer  state  and  all  other  stabilizer  states  as  shown  in  Table  3. 
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Table  3.  Distribution  of  inner  products  (angles)  between  any  one  n-qubit  stabilizer  state  and  all 

other  stabilizer  states  for  n  e  {1, 2, ... ,  7). 


~  i  _ ' _ _ • 


n 

N(n) 

Cn(k)/(N(n)  -  1) 

k  -  1 
(45.00°) 

k  =  2 
(60.00°) 

k  =3 
(69.30°) 

k  =  4 
(75.52°) 

fc  =  5 
(79.82°) 

k  =  6 
(82.82°) 

k=  7 
(84.93°) 

± 

(90.00°) 

1 

2 

3 

4 

5 

6 

7 

6 

60 

1080 

36720 

2423520 

315057600 

81284860800 

80% 

20.34% 

2.59% 

0.16% 

0.01% 

«0% 

«  0% 

54.24% 

20.76% 

3.05% 

0.20% 

0.01% 

«0% 

47.45% 

20.92% 

3.27% 

0.23% 

0.01% 

44.62% 

20.96% 

3.39% 

0.24% 

43.27% 

20.97% 

3.44% 

42.60% 

20.97% 

42.27% 

20% 

25.42% 

29.19% 

31.25% 

32.29% 

32.81% 

33.07% 

In  Table  3  J\f(n)  is  the  total  number  of  n-qubit  stabilizer  states.  The  last  column  indicates  the 
ratio  of  orthogonal  (1)  states.  Table  3  shows  that,  for  sufficiently  large  n,  1/3  of  all  stabilizer 
states  are  orthogonal  to  \  ip)  and  the  fraction  of  /c -neighbors  tends  to  zero  for  0  <  k  <  n  —  4. 
Thus,  2/3  of  all  stabilizer  states  are  oblique  to  \xp)  and  this  fraction  is  dominated  by  the  k- 
neighbors,  where  n  —  4  <  k  <  n.  These  findings  suggest  a  rather  unifonn  geometric  structure 
for  stabilizer  states.  Detailed  theorems  and  proofs  describing  these  results  are  included  in  our 
published  manuscript  [10]. 


Embedding  of  stabilizer  geometry  in  the  Hilbert  space.  We  also  describe  how  the  discrete 
embedding  of  stabilizer  geometry  in  Hilbert  space  complicates  several  natural  geometric  tasks. 
As  described  above,  our  results  on  the  geometric  properties  of  stabilizer  states  imply  that  there 
are  significantly  more  stabilizer  states  than  the  dimension  of  the  Hilbert  space  in  which  they  are 
embedded,  and  that  they  are  arranged  in  a  fairly  uniform  pattern.  These  factors  suggest  that,  if 
one  seeks  a  stabilizer  state  closest  to  a  given  arbitrary  quantum  state,  local  search  appears  a 
promising  candidate.  To  the  contrary,  in  Section  4.2  of  [10]  show  that  local  search  does  not 
guarantee  finding  such  stabilizer  states.  The  second  natural  task  we  consider  is  representing  or 
approximating  a  given  arbitrary  quantum  state  by  a  short  linear  combination  of  stabilizer  states. 
Again,  having  considerably  more  stabilizer  states  than  the  dimension  of  the  Hilbert  space  seems 
to  suggest  a  positive  result.  However,  in  Section  4.3  of  [10]  we  demonstrate  a  family  of  quantum 
states  that  exhibits  asymptotic  orthogonality  to  all  stabilizer  states  and  can  thus  be  neither 
represented  nor  approximated  by  short  superpositions  of  stabilizer  states.  Furthermore,  as  the 
following  proposition  shows,  the  maximal  radius  of  any  2n-dimensional  ball  centered  at  a  point 
on  the  unit  sphere  that  does  not  contain  any  n-qubit  stabilizer  states  cannot  exceed  s/2,  but 
approaches  s[2  as  n  -»  oo. 


Proposition  2.  Consider  2n -dimensional  balls  centered  at  a  point  on  the  unit  sphere  that  do  not 
contain  any  n-qubit  stabilizer  states  in  their  interior.  The  radius  of  such  balls  cannot  exceeds! 2, 
but  approaches  s[2  as  n  -»  oo. 

Proof.  Consider  an  arbitrary  ball  B  with  radius  s[2  and  centered  on  the  unit  sphere  as  shown  in 
Figure  5.  B  covers  half  of  the  unit  sphere.  Let  |s)  be  a  stabilizer  state  that  does  not  lie  on  the 
boundary  of  B.  Then,  either  |s)  or  -  |s)  is  inside  B.  Furthermore,  observe  that  the  intersection  of 
the  unit  sphere  and  the  boundary  of  B  is  contained  in  a  hyperplane  of  dimension  n  —  1.  If  all 
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stabilizer  states  were  contained  there,  they  would  not  have  included  a  single  basis  set.  However, 
since  stabilizer  states  contain  the  computational  basis,  they  cannot  all  be  contained  in  that 
intersection.  An  asymptotic  lower  bound  for  the  radius  of  B  is  obtained  using  the  family  of 

/|00)  +  |01)+|10)\ 

stabilizer-evading  states  of  the  fonn  ^ - -j= - J  (Theorem  4.2  in  [10]).  These  states  are 

asymptotically  orthogonal  to  all  n-qubit  stabilizer  states  (n  -»  go).  Therefore,  the  distance  to  their 
closest  stabilizer  state  approaches  V2. 


Figure  5.  A  ball  B  with  radius  V2  centered  on  the  unit  sphere  covers  half  of  the  unit  sphere. 

Every  such  ball  contains  at  least  one  stabilizer  state  (in  most  cases,  half  of  all  stabilizer  states). 

Computational  geometry  of  stabilizer  states.  Angles  between  stabilizer  states  were  discussed 
in  [11],  where  the  authors  describe  possible  values  for  such  angles  and  outline  an  inner-product 
computation  that  involves  the  synthesis  of  a  basis-nonnalization  stabilizer  circuit  that  maps  a 
stabilizer  state  to  a  computational  basis  state.  We  observed  that  this  circuit-synthesis  procedure  is 
the  computational  bottleneck  of  the  algorithm  and  thus  any  improvements  to  this  synthesis 
process  translate  into  increased  perfonnance  of  the  overall  algorithm.  It  was  also  shown  in  [11] 
that,  for  any  unitary  stabilizer  circuit,  there  exists  an  equivalent  block-structured  canonical  circuit 
that  applies  a  block  of  Hadamard  (H)  gates  only,  followed  by  a  block  of  CNOT  (C)  only,  then  a 
block  of  Phase  (P)  gates  only,  and  so  on  in  the  7-block  sequence  H-C-P-C-P-C-H.  The  number  of 
gates  in  any  H  and  P  block  is  at  most  2 n  and  4n,  respectively,  so  the  total  number  of  gates  for 
such  canonical  circuits  is  dominated  by  the  C  blocks,  which  contain  0(n2logn )  gates. 
Therefore,  reducing  the  number  of  C  blocks  (or  those  involving  other  controlled  operations) 
leads  to  more  compact  circuits  that  minimize  the  build-up  of  decoherence  when  implementing 
QECC  encoding  procedures.  Using  an  alternative  representation  for  stabilizer  states,  the  work  in 
[13]  proves  the  existence  of  canonical  circuits  with  the  shorter  sequence  H-C-X-P-CZ,  where  the 
X  and  CZ  blocks  consist  of  NOT  and  Controlled-Z  gates,  respectively.  Such  circuits  contain  only 
two  controlled-op  blocks  (as  compared  to  three  C  blocks  in  the  7-block  sequence  from  [11])  but 
no  algorithms  are  available  in  literature  to  synthesize  these  circuits  for  an  arbitrary  stabilizer 
state.  We  described  an  algorithm  for  synthesizing  canonical  circuits  with  the  block  sequence  H- 
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C-CZ-P-H.  Our  circuits  are  therefore  close  to  the  smallest  known  circuits  proposed  in  [13].  Our 
canonical  circuits  improve  the  computation  of  inner  products,  helping  us  outperform  the  inner- 
product  algorithm  based  on  non-canonical  circuits  proposed  in  [14]  as  show  in  Figure  6. 
Furthermore,  we  leveraged  our  circuit-synthesis  technique  and  inner-product  algorithm  to:  (/) 
compute  stabilizer  bivectors  efficiently  and  (/'/')  derive  orthogonalization  procedure  for  linear 
combinations  stabilizer  states.  These  algorithms  rely  on  the  stabilizer-matrix  representation  for 
quantum  states  and  are  described  in  detail  in  [10]. 


Number  of  qubits 


Number  of  qubits 


(a)  (b) 

Figure  6.  (a)  Runtime  and  (b)  circuit-size  comparisons  between  our  circuit-synthesis  algorirthm 
(Algorithm  5.1)  and  the  circuit  synthesis  portion  of  the  Audenaert-Plenio  inner-product  algorithm. 


Figure  6  shows  that  on  average,  our  algoirthm  runs  roughly  twice  as  fast  and  produces  canonical 
circuits  that  contain  less  than  half  as  many  gates.  Furthermore,  the  Audenaert-Plenio  circuits  are 
not  canonical. 

4.3  Engineering  stabilizer-based  simulation  of  generic  quantum  circuits 

The  goals  pursued  in  this  part  of  the  overall  effort  were  to  leverage  our  results  on  the  geometry 
of  stabilizer  states  (Section  4.1)  in  order  to  design  novel  data  structures  and  algorithms  for 
simulation  of  generic  quantum  circuits  via  the  stabilizer  formalism.  The  stabilizer  gates  by 
themselves  do  not  form  a  universal  set  for  quantum  computation  [11]  [12].  However,  the 
Hadamard  and  Toffoli  (TOF)  gates  do  [15].  To  simulate  TOF  and  other  non-stabilizer  gates,  we 
extend  the  formalism  to  include  the  representation  of  arbitrary  quantum  states  as  superpositions 
of  stabilizer  states. 

Observation  3.  Given  an  n-qubit  stabilizer  state  | ip),  there  exists  an  orthonormal  basis  including 
\rp)  and  consisting  entirely  of  stabilizer  states.  One  such  basis  is  obtained  directly  from  the 
stabilizer  representation  of  \xp)  by  changing  the  signs  of  an  arbitrary,  non-empty  subset  of 
generators  ofS(|i/>)),  i.e.,  by  modifying  the  phase  vector  of  the  stabilizer  matrix  for  \xp).  Thus, 
one  can  produce  2n  —  1  additional  orthogonal  stabilizer  states.  Such  states,  together  with  \xp), 
fonn  an  orthonormal  basis.  This  basis  is  unambiguously  specified  by  a  single  stabilizer  state,  and 
any  one  basis  state  specifies  the  same  basis. 
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Definition  4.  An  n-qubit  stabilizer  frame  T  is  a  set  of  k  <  2n  stabilizer  states  { h/h))y=i  that 
forms  an  orthogonal  subspace  basis  in  the  Hilbert  space.  We  represent  T  by  a  pair  consisting  of 
(/)  a  stabilizer  matrix  M  and  (/'/')  a  set  of  distinct  phase  vectors  {ery}y=1,  where  er,  G  {±l}n.  We 
use  Ma>  to  denote  the  ordered  assignment  of  the  elements  in  Oj  as  the  (±l)-phases  of  the  rows 
inJVf.  Therefore,  state  |i/>;  )  is  represented  byJVf°T  The  size  of  the  frame,  which  we  denote 
by  in  is  equal  to  k. 

Each  phase  vector  Oj  can  be  viewed  as  a  binary  (0-1)  encoding  of  the  integer  index  that  denotes 
the  respective  basis  vector.  Thus,  when  dealing  with  64  qubits  or  less,  a  phase  vector  can  be 
compactly  represented  by  a  64-bit  integer  (modern  CPUs  also  support  128-bit  integers).  To 
represent  an  arbitrary  state  IT7)  using  T,  one  additionally  maintains  a  vector  of  complex 
amplitudes  a  =  (a1;  ...,ak),  which  corresponds  to  the  decomposition  oflTOin  the  basis 
{\fj)}kj=1  defined  by  T,  i.e.,  Ih7)  =  Y/j=i  Q-jl^Pj)  and£;=i  |a;|2  =  1  (see  Figure  7).  Observe  that 
each  aj  forms  a  pair  with  phase  vector  Oj  in  T  since \ifjj)  =  JVf ai .  Any  stabilizer  state  can  be 
viewed  as  a  one-element  frame. 


|T>  =  a1(|00>  +  |01))  +  a2(|10)  +  |ll» 


a  =  (a1(a2) 
Z  I 


M  = 


1  X 


Phase  vectors 

!  °i :(+.  +  )  i — ' 


Ma'  = 


+  r  z 


+  1/ 


Ma* 


=  ~\Z 

+  1/ 


^]  =  |00>  +  |01> 
^]  =  |10>  +  |11> 


Figure  7.  Example  of  a  stabilizer  frame  T  that  represents  |*F).  While  IV)  is  composed  of  four 
computational-basis  amplitudes,  its  frame  representation  has  only  two  phase  vectors. 

Stabilizer  frames  are  the  fundamental  data  structure  in  our  technique.  Prior  work  on  simulation 
of  non-stabilizer  gates  using  the  stabilizer  formalism  can  be  found  in  [11]  where  the  authors 
represent  a  quantum  state  as  a  sum  of  0(42dk)  density-matrix  terms  while  simulating  k  non¬ 
stabilizer  operations  acting  on  d  distinct  qubits.  In  contrast,  the  number  of  states  in  our  technique 
is  0(2k)  although  we  do  not  handle  density  matrices  and  perform  more  sophisticated 
bookkeeping  including  an  0(n2)  algorithm  to  compute  the  global  phase  of  a  stabilizer  state. 
Since  the  stabilizer  formalism  simulates  gates  via  their  action-by-conjugation,  global  phases  are 
not  maintained.  This  is  not  an  issue  when  dealing  with  a  single  stabilizer  state  since  a  global 
phase  does  not  affect  the  statistics  of  measurement.  However,  such  phases  must  be  maintained 
when  dealing  with  stabilizer-state  superpositions,  where  global  phases  become  relative. 

We  designed  the  following  frame  operations,  which  are  efficient  in  the  size  of  the  stabilizer 
frame. 


•  Frame  rotation  -  facilitates  simulation  of  stabilizer  gates.  The  procedure  updates  the 
stabilizer  matrix  and  phase  vectors  associated  with  T  according  to  the  mapping  rules 
specified  by  the  stabilizer  formalism.  Each  element  in  the  vector  of  amplitudes 
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(a1;  ...,ak)  is  also  updated  using  our  global-phase  computation  algorithm.  The 
runtime  of  this  operation  is  0(n2  +  n|!F|). 

•  Frame  cofactoring  -  facilitates  simulation  of  measurements  and  non-stabilizer 
operations  such  as  Toffoli  and  controlled-rotation  gates.  This  procedure  updates  the 
stabilizer  matrix  associated  with  F.  Then  iterates  over  each  phase  vector  in  T  and 
permutes  the  corresponding  phases  in  order  to  generate  additional  phase  vectors 
corresponding  to  cofactor  states,  which  are  output  states  obtained  after  perfonning 
computational-basis  measurements.  As  in  the  case  of  stabilizer  gates,  this  operation  is 
linear  in  the  number  of  phase  vectors.  However,  by  the  end  of  the  operation,  the 
number  of  phase  vectors  (states)  in  T  will  have  grown  by  a  (worst  case)  factor  of  four 
in  the  case  of  both  Toffoli  and  controlled-rotation  gates.  For  an  arbitrary  n-qubit 
stabilizer  frame  F,  the  number  of  phase  vectors  is  upper  bounded  by  2n,  the  number 
of  possible  +1  pennutations. 

Our  simulation  technique,  implemented  in  our  software  package  Quipu,  admits  a  universal  gate 
set  for  quantum  computation  [15]. 

Multiframe  simulation.  Although  a  single  frame  is  sufficient  to  represent  a  stabilizer-state 
superposition,  one  can  sometimes  tame  the  exponential  growth  of  states  in  \ip)  by  constructing  a 
multiframe  representation.  Such  a  representation  cuts  down  the  total  number  of  states  required  to 
represent  | xp)  by  at  least  a  half,  thus  improving  the  scalability  of  our  technique.  Our  experiments 
showed  that,  when  simulating  ripple-carry  adders,  the  number  of  states  in  | xp)  grows  linearly 
when  multiframes  are  used  but  exponentially  when  a  single  frame  is  used.  To  this  end,  we 
introduce  an  additional  frame  operation  called  coalescing  whose  output  is  a  list  of  n-qubit  frames 
{‘■F'1,T'2,  ■  ■■  ,F's]  (i.e.,  a  multiframe)  that  together  represent  the  same  superposition  as  the  original 
input  frame  IF.  Figure  8  shows  an  example. 


Figure  8.  Example  of  how  a  multiframe  representation  is  derived  from  a  single-frame 

representation. 

To  obtain  F1  in  Figure  8,  we  conjugate  the  second  column  of  JVf  by  a  Hadamard  gate  and 
coalesce  the  first  two  phase  vectors  from  IF.  To  obtain  F2,  conjugate  the  second  column  JVf  by 
Hadamard,  then  conjugate  the  second  and  third  columns  by  CNOT,  and  coalesce  the  last  two 
phase  vectors  in  F. 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

14 


The  overall  simulation  flow  of  Quipu  is  shown  in  Figure  9.  Each  of  the  steps  in  the  flow  is 
described  in  greater  detail  in  [16]  along  with  additional  background  material. 


Figure  9.  Overall  simulation  flow  for  Quipu. 

Unlike  other  techniques  based  on  compact  representations  of  quantum  states  (e.g.,  using  BDD 
data  structures  [7]),  most  frame-based  operations  are  inherently  parallel  and  lend  themselves  to 
multi- threaded  implementation.  The  only  step  in  Figure  9  that  presents  a  bottleneck  for  a  parallel 
implementation  is  the  orthogonalization  procedure,  which  requires  communication  across 
frames.  All  other  processes  can  be  executed  on  large  subsets  of  frames  independently. 

Experimental  results.  Figure  10  shows  that  Quipu  simulates  certain  quantum  arithmetic  circuits 
in  polynomial  time  and  space  for  input  states  consisting  of  equal  superpositions  of 
computational-basis  states.  On  such  instances,  known  linear-algebraic  simulation  techniques, 
such  as  the  (state-of-the-art)  BDD-based  simulator  QuIDDPro,  take  exponential  time.  We 
simulated  quantum  Fourier  transfonn  and  quantum  fault-tolerant  circuits  with  Quipu,  and  the 
results  demonstrate  that  our  stabilizer-based  technique  leads  to  orders-of-magnitude 
improvement  in  runtime  and  memory  as  compared  to  QuIDDPro  (Figure  11).  While  our 
technique  uses  more  sophisticated  mathematics  and  quantum-state  modeling,  it  is  significantly 
easier  to  implement  and  optimize.  In  particular,  our  multithreaded  implementation  of  Quipu 
exhibited  a  2X  speed  up  on  a  four-core  server.  Additional  results  including  comparisons  with 
stabilizer  circuits  and  quantum  fault-tolerant  circuits  are  covered  in  [16]. 
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Figure  10.  (a)  Ripple-carry  (Cuccaro)  adder  for  3-bit  numbers  [17].  (b)  Average  runtime  and 
memory  needed  by  Quipu  and  QuIDDPro  (QPLite)  to  simulate  n-bit  Cuccaro  adders. 

Figure  10(a)  illustrates  the  Ripple-carry  (Cuccaro)  adder  for  3 -bit  numbers  [17].  Figure  10(b) 
shows  the  average  runtime  and  memory  needed  by  Quipu  and  QuIDDPro  (QPLite)  to  simulate  71- 
bit  Cuccaro  adders  after  a  superposition  of  all  computational  basis  states  is  obtained  using  a 
block  of  Fladamard  gates.  The  quadratic  function /(x)  =  0.5248x2  —  15.815x  +  123.86  fits 
Quipu’s  curve  with  R 2  —  .9986. 

In  Figure  11,  we  show  the  average  runtime  and  memory  needed  by  Quipu  (single-threaded  and 
multi-threaded)  and  QuIDDPro  (QPLite)  to  simulate  n-qubit  QFT  circuits,  which  contain 
n(n  +  l)/2  gates.  We  used  the  input  state  for  all  benchmarks. 


n-qubit  QFT  circuit 


n-qubit  QFT  circuit 


Figure  11.  Average  runtime  and  memory  needed  by  Quipu  (single-threaded  and  multi-threaded) 
and  QuIDDPro  (QPLite)  to  simulate  n-qubit  QFT  circuits,  which  contain  n(n  +  l)/2  gates. 


4.4  Simulating  quantum  circuits  via  stochastic  computing. 

We  have  investigated  the  use  of  an  unconventional  technique  called  stochastic  computing  (SC) 
for  the  simulation  of  quantum  circuits  [18]  [19].  SC  processes  information  represented  by 
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random  bit-streams  of  Os  and  Is.  The  bit-streams  are  called  stochastic  numbers  (SNs)  and  are 
interpreted  as  probabilities.  An  //-bit  SN  N  containing  m  Is  has  the  numerical  value  p  =  min, 
which  is  the  probability  of  a  randomly-observed  bit  of  N  being  1 .  In  their  basic  unipolar  fonn, 
SNs  approximate  numbers  lying  in  the  real  interval  [0,1]. 

The  main  attraction  of  SC  is  that  arithmetic  operations  can  be  implemented  by  simple  logic 
circuits;  see  Figure  12.  For  example,  multiplication  of  two  /7-bit  SNs  p\  and  pi  to  form  the 
product  p i*  pi  can  be  realized  in  n  clock  cycles  by  a  two-input  AND  gate,  provided  p\  and  pi  are 
from  uncorrelated  sources.  Addition  is  implemented  by  a  multiplexer  in  the  scaled  fonn  {p\  + 
p2)ll,  which  ensures  that  the  sum  lies  in  the  probability  range  [0,1].  Note  that  the  scaling  factor  is 
supplied  by  an  independent  SN  r  of  value  1/2,  i.e.,  a  purely  random  bit-stream.  The  remaining 
two  circuits  convert  numbers  between  ordinary  binary  form  N  and  stochastic  form  p  =  N/2k. 
They  serve  to  interface  conventional  and  stochastic  circuits,  and  also  to  re -randomize  SNs  that 
have  become  unduly  correlated.  The  (pseudo)  random  number  generator  in  Figure  12c  is 
typically  implemented  by  a  linear  feedback  shift  register  (LFSR)  using  well-known  methods. 
Signed  numbers  can  be  handled  by  “bipolar”  notation,  which  maps  the  SN  range  from  [0,1]  to 
[-1,1].  This  results  in  minor  changes  to  the  component  in  Figure  12;  for  example,  an  XNOR 
gate  rather  than  an  AND  gate  is  needed  to  perform  bipolar  multiplication;  a  multiplexer 
continues  to  serve  as  a  scaled  adder. 


AND  gate 


Multiplexer 

t  —  0  ^ 


Pixp2 


Pi  ■ 
r=  1/2  ■ 


—  (Pi  +  PiV2 
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(b) 


Random  no. 

k 

^generator 

Binary 

k 

number  N 

Comparator 


A  <  S> - p  =  N/2*  p  — 


N 

J± 


-4^ 


Binary 

counter 


(d) 


Figure  12,  A  selection  of  (unipolar)  SC  components:  (a)  multiplier,  (b)  scaled  adder,  (c)  binary-to- 
stochastic  converter,  and  (d)  stochastic-to-binary  converter. 

Stochastic  computing  has  some  drawbacks  that  have  limited  it  to  a  rather  narrow  range  of 
applications.  These  include  long  computing  times,  inaccuracies  due  to  bit-stream  fluctuations, 
and  the  need  for  scaling.  Recently,  however,  some  very  promising  new  applications  for  SC  have 
emerged,  notably  decoding  LDPC  codes  and  image  processing,  which  suggest  that  SC  is  much 
more  practical  than  previously  believed  [18].  A  major  concern  with  SC  has  been  that  run-time, 
as  measured  by  the  number  of  clock  cycles  for  a  computation  step,  tends  to  grow  exponentially 
with  precision.  Doubling  bit-stream  length  from  n  to  2n  increases  precision  by  only  1  bit. 
However,  SC  can  be  much  faster  than  it  appears  for  several  reasons:  the  basic  clock  cycle  needed 
for  operations  like  add  and  multiply  is  very  short,  SC  tasks  can  often  be  parallelized,  and 
precision  can  be  varied  on-the-fly  (progressive  precision). 
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Figure  13.  Stochastic  circuit  to  simulate  multiplication  of  a  complex  vector  (it,  i2)  by  a  complex 

matrix  ( a,b;c,d ). 


Modeling  quantum  circuits.  To  model  a  quantum  circuit,  we  need  to  represent  the  quantum  state 
{a,  b)T  by  SNs,  where  each  of  the  probability  amplitudes  a  and  h  is  a  complex  number  with  real 
and  imaginary  parts  nr  and  «„  respectively.  The  probability  property  \a\  +  \b\~  =  1  constrains  both 
nr  and  n,  to  he  in  the  real  interval  [-1,1],  which  is  precisely  the  range  of  the  bipolar  SNs.  Hence, 
we  use  two  such  SNs  to  represent  one  complex  probability  amplitude,  and  2"  +  2  SNs  to  denote  an 
//-qubit  state.  To  model  a  quantum  gate  operation,  we  map  it  to  a  conventional  logic  circuit  that 
processes  the  SNs  in  an  appropriate  way. 


A  quantum  gate  G  corresponds  to  a  2"  x  2”  unitary  matrix  M,  and  application  of  G  requires 
multiplying  the  current  state  of  the  quantum  circuit  by  M.  For  n  =  1,  we  get 


This  is  converted  to  SC  form  in  which  each  complex  number  is  represented  by  an  SN  pair  (nr,n,): 


ai 1  +  bi2  —  o 1 
ci 1  +  di2  =  o2 

(ar,cq)(irH*)  +  (br.biXit.i?)  = 
(cr,Ci)(#,#)  +  =  (o^  <  of') 

=  ( aril  —  atii )  +  (bri^  —  bti2) 
o ?  —  ( cril  —  Ciif )  +  (drir  —  dti f) 
of  —  ( ari}  +  ajir)  +  (brif  +  b^) 
of  —  ( crif  +  Ciif)  +  ( drif  +  diif) 
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Matrix  multiplication  is  implemented  by  bipolar  multipliers  (XNOR  gates)  and  adders 
(multiplexers);  see  Figure  13.  Four  multipliers  and  two  adders  are  required  for  one  complex 
multiplication.  A  total  of  2"  -  1  stochastic  adders  are  used  for  all  real  components  of  the  state 
vector;  the  same  number  of  adders  is  used  for  the  imaginary  part.  If  the  adders  are  arranged  in  a 
tree,  the  number  of  adders  on  all  paths  is  n,  and  the  resulting  numbers  must  be  multiplied  by  a 
compensating  factor  of  2". 

Experimental  results.  We  applied  the  foregoing  SC-based  simulation  approach  to  quantum 
circuits  of  several  representative  types  [18].  These  include  a  class  of  stabilizer  circuits  called 
GHZ n  (Greenberger-Horne-Zeilinger)  circuits,  which  produce  entangled  states  with  a  desired 
number  of  qubits  n.  Each  target  quantum  circuit  was  automatically  mapped  to  a  stochastic 
version  in  VHDL,  synthesized  from  the  VHDL,  and  then  transferred  to  a  commercial  FPGA 
platform.  Further  infrastructure,  such  as  random  number  generators  and  counters  needed  for 
format  conversion  (Figure  12)  was  also  synthesized. 
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Figure  14.  Stochastic  simulation  of  GHZ3  demonstrating  progressive  precision. 

Figure  14  shows  the  results  of  an  experiment  in  simulating  GHZ3.  This  circuit  has  three  qubits, 
and  therefore  eight  amplitudes.  GHZ3  is  applied  to  the  input  state  |000)  to  produce  the  entangled 
output  state  (1/V2,  0,  0,  0,  0,  0,  0,  1/V2)1.  The  stochastic  model  for  GHZ3  computes  the  state  (ao, 
...,  c/7)1  represented  by  16  SNs,  the  real  and  the  imaginary  parts  of  each  amplitude  a,.  Due  to  the 
scaling  inherent  in  stochastic  addition,  all  entries  are  divided  by  2  =8.  Therefore,  the  outcome 
of  the  simulation  with  no  approximation  error  is  are f  =  (0.125,  0,  0,  0,  0,  0,  0,  0.125)1  where 
0.125  =  (1/V2)  /  4V2.  Figure  14  shows  the  output  values  for  various  SN  lengths.  Here  “aOr” 
stands  for  the  real  component  of  amplitude  a 0.  The  bar  corresponding  to  bit-length  n  =  2  is  less 
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than  0.1,  whereas  the  other  three  bars  are  close  to  the  correct  value  of  0.125.  It  can  be  seen  that 

O  1  /T 

the  values  obtained  for  n  =  2  bear  little  resemblance  to  the  exact  value  aref,  the  values  for  n  =  2 
are  relatively  close  to  are f,  and  the  values  for  n  =  2~  and  n  =  2  perfectly  match  are f.  This  can  be 
quantified  by  defining  the  average  error  as  the  Euclidian  distance  between  the  obtained 
distribution  and  are f.  For  n  =  28,  216,  224  and  232,  the  error  is  0.03662,  0.00360,  0.00019  and  0, 
respectively.  As  expected  very  long  SNs  are  needed  for  high  accuracy,  but  precision  increases  in 
progressive  fashion. 

The  above  results  also  demonstrate  that  the  simulation  complexity  as  measured  by  the  number  of 
SC  components  needed  can  be  orders  of  magnitude  less  than  that  of  a  classical  version.  The  main 
limitation  of  the  SC  approach  is  the  very  long  run-times  required  to  achieve  adequate  accuracy. 
We  have  shown,  however,  that  this  problem  can  be  mitigated  by  exploiting  SC’s  progressive 
precision  property.  Other  optimizations  seem  possible  to  further  reduce  model  size  and  run-time, 
and  are  the  t  of  on-going  research. 

5  CONCLUSIONS 

Our  research  advanced  the  state  of  the  art  in  simulating  quantum  circuits.  In  particular,  we  have 
further  optimized  the  QuIDDPro  software  developed  under  prior  support  from  DARPA  and 
AFRL  and  continued  to  maintain  it.  We  have  also  significantly  broadened  theory  and  algorithmic 
techniques  necessary  to  extend  the  stabilizer  formalism  and  encompass  the  simulation  of  non¬ 
stabilizer  gates,  while  enjoying  the  efficiency  of  the  stabilizer  fonnalism  on  circuits  with  heavy 
quantum  error  correction.  Developing  such  algorithms  in  full  detail  and  implementing  them  as 
reusable  software  components  laid  foundation  for  high-perfonnance  simulation  tools  for 
quantum  circuits  dominated  by  stabilizer  gates.  Computational  experiments  indicate  that  our 
techniques  outperform  state-of-the-art  simulators  for  practical  instances  of  quantum  arithmetic, 
quantum  fault  tolerant  and  quantum  Fourier  transfonn  circuits. 

Through  the  duration  of  our  project  we  have  cultivated  productive  relations  with  other 
research  groups  in  the  industry  and  academia.  Our  software  QuIDDPro  has  been  downloaded  a 
number  of  times  for  educational  and  research  purposes.  It  is  also  commonly  used  in  scholarly 
publications  for  benchmarking  purposes.  We  have  coauthored  an  extensive  journal  paper  on  the 
geometry  of  stabilizer  states  with  Dr.  Andrew  Cross  from  IBM  (formerly  at  SAIC).  PI  Markov 
coauthored  several  papers  with  Dr.  Mehdi  Saeedi  from  USC,  including  one  in  Phys  Rev  A  on 
speeding  up  Shor's  algorithm.  In  2012,  PI  Markov  visited  the  Institute  for  Quantum  Computing 
at  Waterloo  by  invitation,  presented  research  covered  in  this  report,  and  had  discussions  with 
several  researchers  including  the  director  of  the  institute  Prof.  Michele  Mosca,  as  well  as  Dr. 
Dmitri  Maslov  (currently  a  Program  Director  at  the  National  Science  Foundation).  PI  Markov 
was  also  invited  to  visit  Microsoft  Research  Faculty  Summit  in  Redmond  in  2013,  gave  a  related 
presentation  to  the  QuArC  group,  and  interacted  with  Dr.  Krysta  Svore,  Dr.  Burton  Smith,  Dr. 
Dave  Wecker  and  a  number  of  other  Microsoft  researchers,  as  well  as  several  visiting  academic 
researchers  (Prof.  Umesh  Vazirani,  Prof.  Scott  Aaronson,  Prof.  A1  Aho,  Prof.  Joe  Traub).  We  are 
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continuing  technical  interactions  with  the  QuArC  group  at  Microsoft  Research,  including 
benchmarking  of  our  new  simulator  software  against  their  software.  Univ.  of  Michigan 
Technology  Transfer  specialists  are  currently  engaged  with  Microsoft  Business  Development 
specialists  regarding  potential  licensing  of  our  algorithms  and  tools  to  Microsoft. 
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LIST  OF  SYMBOLS,  ABBREVIATIONS,  AND  ACRONYMS 


AFRL  -  Air  Force  Research  Laboratory 

BDD  -  Binary  Decision  Diagram 

DARPA  -  Defense  Advanced  Research  Projects  Agency 

H  -  Hadamard  gate 

CNOT  -  controlled  NOT  gate 

CZ  -  controlled-Z  gate 

P  -  phase  gate 

QECC  -  Quantum  Error  Correcting  Codes 
SC  -  stochastic  computing 
SN  -  stochastic  number 
M  -  stabilizer  matrix 
T  -  stabilizer  frame 
X  -  Pauli -A  matrix  (operator) 

Y  -  Pauli-F  matrix  (operator) 

Z  -  Pauli-Z  matrix  (operator) 
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